Maharashtra District Geographical Analysis
import pandas as pd
import seaborn as sns
import numpy as np
from pandas import Series, DataFrame
import matplotlib.pyplot as plt
%matplotlib inline
from datetime import date
!pip install plotly
import plotly.express as px
import plotly.graph_objects as go
Requirement already satisfied: plotly in c:\users\anshika\anaconda3\lib\site-packages (5.24.1) Requirement already satisfied: tenacity>=6.2.0 in c:\users\anshika\anaconda3\lib\site-packages (from plotly) (8.2.3) Requirement already satisfied: packaging in c:\users\anshika\anaconda3\lib\site-packages (from plotly) (24.1)
data = pd.read_csv('maharashtra-districts.csv', encoding= 'latin-1')
data.head()
| District Name | District Code | Administrative Division | Headquarters | Number of Talukas | Area (in sq. km) | Population (Census 2011) | Population Density (per sq. km) | Sex Ratio | Literacy Rate (%) | Urban Population (%) | Formation Date | Geographical Coordinates (Latitude and Longitude) | Major River(s) | Major Crop(s) | Key Industries/Economy | Tourist Attractions | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Pune | PU | Pune Division | Pune | 14 | 15643 | 9429408 | 603 | 915 | 86.15 | 60.5 | 1 May 1960 | 18.52° N, 73.85° E | Bhima, Mula, Mutha, Indrayani | Sugarcane, Jowar, Bajra, Grapes, Onions | IT & ITeS, Automotive, Manufacturing, Educatio... | Shaniwar Wada, Aga Khan Palace, Sinhagad Fort,... |
| 1 | Satara | ST | Pune Division | Satara | 11 | 10480 | 3003741 | 287 | 988 | 82.87 | 21.9 | 1 May 1960 | 17.68° N, 74.01° E | Krishna, Koyna, Venna | Sugarcane, Jowar, Soybean, Turmeric | Agriculture, Wind Power, Sugar Factories, Tourism | Kaas Plateau, Mahabaleshwar, Panchgani, Thoseg... |
| 2 | Sangli | SN | Pune Division | Sangli | 10 | 8572 | 2822143 | 329 | 966 | 81.48 | 25.5 | 1 May 1960 | 16.85° N, 74.58° E | Krishna, Warana | Sugarcane, Grapes, Turmeric, Jowar | Sugar Production, Turmeric Processing, Textile... | Sagareshwar Wildlife Sanctuary, Chandoli Natio... |
| 3 | Solapur | SO | Pune Division | Solapur | 11 | 14895 | 4317756 | 290 | 938 | 77.02 | 32.4 | 1 May 1960 | 17.68° N, 75.90° E | Bhima, Sina, Man | Jowar, Sugarcane, Pomegranate, Pulses | Textiles (Chaddars), Sugar Factories, Beedi In... | Siddheshwar Temple, Akkalkot Swami Samarth Mah... |
| 4 | Kolhapur | KO | Pune Division | Kolhapur | 12 | 7685 | 3876001 | 504 | 957 | 81.51 | 31.7 | 1 May 1960 | 16.70° N, 74.24° E | Panchganga, Krishna, Dudhganga | Sugarcane, Rice, Soybean, Jaggery | Sugar Mills, Foundries, Textiles, Kolhapuri Ch... | Mahalakshmi Temple, Panhala Fort, Jyotiba Temp... |
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 36 entries, 0 to 35 Data columns (total 17 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 District Name 36 non-null object 1 District Code 36 non-null object 2 Administrative Division 36 non-null object 3 Headquarters 36 non-null object 4 Number of Talukas 36 non-null int64 5 Area (in sq. km) 36 non-null int64 6 Population (Census 2011) 36 non-null int64 7 Population Density (per sq. km) 36 non-null int64 8 Sex Ratio 36 non-null int64 9 Literacy Rate (%) 36 non-null float64 10 Urban Population (%) 36 non-null float64 11 Formation Date 36 non-null object 12 Geographical Coordinates (Latitude and Longitude) 36 non-null object 13 Major River(s) 36 non-null object 14 Major Crop(s) 34 non-null object 15 Key Industries/Economy 36 non-null object 16 Tourist Attractions 36 non-null object dtypes: float64(2), int64(5), object(10) memory usage: 4.9+ KB
data.describe()
| Number of Talukas | Area (in sq. km) | Population (Census 2011) | Population Density (per sq. km) | Sex Ratio | Literacy Rate (%) | Urban Population (%) | |
|---|---|---|---|---|---|---|---|
| count | 36.000000 | 36.000000 | 3.600000e+01 | 36.000000 | 36.000000 | 36.000000 | 36.000000 |
| mean | 9.944444 | 8560.666667 | 3.119993e+06 | 1436.944444 | 947.333333 | 80.864167 | 33.916667 |
| std | 3.970926 | 4095.966971 | 2.155514e+06 | 4569.022377 | 46.746734 | 5.596462 | 22.232814 |
| min | 0.000000 | 157.000000 | 8.496510e+05 | 74.000000 | 832.000000 | 64.380000 | 11.000000 |
| 25% | 7.750000 | 5614.750000 | 1.655256e+06 | 240.750000 | 929.500000 | 77.200000 | 19.450000 |
| 50% | 9.500000 | 7701.500000 | 2.610229e+06 | 286.000000 | 944.500000 | 81.910000 | 26.350000 |
| 75% | 14.000000 | 10880.500000 | 3.744962e+06 | 366.500000 | 959.500000 | 84.635000 | 37.550000 |
| max | 16.000000 | 17048.000000 | 9.429408e+06 | 20980.000000 | 1122.000000 | 89.910000 | 100.000000 |
We see that data is mostly normally distributed in Number of Talukas, Sex Ratio, Literacy Rate as well as to some extent in Urban Population. Population Density and Area via intitutively as well as through data show us the skewness. There are some districts which have 100% urban population. 50% of data is below 81.91 % literacy rate. Total there are 36 unique states.
Q.1 Examining which district has maximum no. of Talukas?
data[data['Number of Talukas'] == max(data['Number of Talukas'])]['District Name']
24 Nanded 33 Yavatmal Name: District Name, dtype: object
Probably large rural population and large area might be the reason for these districts to have such division of Talukas. However it should not be emphasized enough as there might be other factors in such no. of Talukas.
Q.2 What's the relation between Literacy Rate and Sex Ratio?
sns.lmplot(x='Sex Ratio',y= 'Literacy Rate (%)', data= data)
plt.show()
data['Sex Ratio'].corr(data['Literacy Rate (%)'])
-0.1329962817499564
We see that in this interesting case, they roughly have a negative relation, although their Pearson correlation is telling us that there is very weak relationship and there might be other for these two variables in influencing their significance.
from matplotlib.ticker import FuncFormatter
def converter(x,pos):
return f'{x/1e6:.1f}M'
df_sorted= data.sort_values('Population (Census 2011)', ascending= False)
order_list = df_sorted['District Name']
graph= sns.barplot(x='District Name', y='Population (Census 2011)', hue= 'District Name', legend= False,
order= order_list, palette='summer', data= data)
graph.yaxis.set_major_formatter(FuncFormatter(converter))
plt.xticks(rotation='vertical')
plt.title('Maharashtra Districts Population as per Census 2011')
plt.show()
Pune and Mumbai Suburban(not Mumbai City) accounts for most of the population as per census 2011. There is very large difference in terms of different population of the states and this totally aligns with the geography of the Maharashtra, where some parts face droughts and other problems, making it difficult to live accessibly.
Q.3 Calculating the Sex Ratio of different states and how they are performing in terms of their comparsion with mean sex ratio of the maharashtra.
state_avg = round(data['Sex Ratio'].mean(),2)
state_avg
947.33
sns.barplot(x='District Name', y='Sex Ratio', hue='District Name', legend= False,
palette= 'mako', data= data, order= order_list)
plt.axhline(state_avg, color='red', label='State_avg')
plt.ylim(min(data['Sex Ratio']) -100, max(data['Sex Ratio']) +100)
plt.xticks(rotation= 'vertical')
plt.legend()
plt.title('Comparing Sex Ratio of districts with the state average')
plt.show()
Clearly high population does not result in high sex ratio. Even small districts like Nandurbar performed very well in managing their sex ratio. Mumbai City being the urban city has the lowest sex ratio. This graph shows us different perspectives and parameters should be taken into account while understanding sex ratio of a place. Probably, smaller districts models can be looked upon while initating policies in regards to this topic.
data.head(1)
| District Name | District Code | Administrative Division | Headquarters | Number of Talukas | Area (in sq. km) | Population (Census 2011) | Population Density (per sq. km) | Sex Ratio | Literacy Rate (%) | Urban Population (%) | Formation Date | Major River(s) | Major Crop(s) | Key Industries/Economy | Tourist Attractions | Latitude | longitude | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Pune | PU | Pune Division | Pune | 14 | 15643 | 9429408 | 603 | 915 | 86.15 | 60.5 | 1 May 1960 | Bhima, Mula, Mutha, Indrayani | Sugarcane, Jowar, Bajra, Grapes, Onions | IT & ITeS, Automotive, Manufacturing, Educatio... | Shaniwar Wada, Aga Khan Palace, Sinhagad Fort,... | 18.52 | 73.85 |
Q.4 Understanding literacy rate in rural population of the maharashtra.
copied_data = data.copy()
Urban_pop = (copied_data['Urban Population (%)']/100)*copied_data['Population (Census 2011)']
copied_data['Rural_pop'] = copied_data['Population (Census 2011)'] - Urban_pop
copied_data.sort_values('Rural_pop', ascending= False).reset_index(drop=True)
copied_data.head(2)
| District Name | District Code | Administrative Division | Headquarters | Number of Talukas | Area (in sq. km) | Population (Census 2011) | Population Density (per sq. km) | Sex Ratio | Literacy Rate (%) | Urban Population (%) | Formation Date | Geographical Coordinates (Latitude and Longitude) | Major River(s) | Major Crop(s) | Key Industries/Economy | Tourist Attractions | Rural_pop | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Pune | PU | Pune Division | Pune | 14 | 15643 | 9429408 | 603 | 915 | 86.15 | 60.5 | 1 May 1960 | 18.52° N, 73.85° E | Bhima, Mula, Mutha, Indrayani | Sugarcane, Jowar, Bajra, Grapes, Onions | IT & ITeS, Automotive, Manufacturing, Educatio... | Shaniwar Wada, Aga Khan Palace, Sinhagad Fort,... | 3724616.160 |
| 1 | Satara | ST | Pune Division | Satara | 11 | 10480 | 3003741 | 287 | 988 | 82.87 | 21.9 | 1 May 1960 | 17.68° N, 74.01° E | Krishna, Koyna, Venna | Sugarcane, Jowar, Soybean, Turmeric | Agriculture, Wind Power, Sugar Factories, Tourism | Kaas Plateau, Mahabaleshwar, Panchgani, Thoseg... | 2345921.721 |
copied_data['Literacy_Category'] = pd.cut(df['Literacy Rate (%)'], bins=3, labels=['Low', 'Medium', 'High'])
plot= sns.kdeplot(x='Rural_pop', hue='Literacy_Category', data=copied_data)
plot.xaxis.set_major_formatter(FuncFormatter(converter))
plt.show()
We see their is more or less normalization towards a rural population of an average 1 million. Although some districts as an outlier has also seen 4 million of rural population, that too in medium literacy rate. High literacy rate generally implying high density and low density being low literacy rate. This might reflect in terms of the infrastructure building and making social policies more accessible in easy terrain or in high density rural population.
Q.5 Analysing density levels of the Urban Population.
labels = [ 'Low Density', 'Medium Density', 'High Density']
copied_data['binned_data'] = pd.qcut(copied_data['Population Density (per sq. km)'], q=3, labels=labels)
grouped_data = copied_data.groupby('binned_data', observed=True)['Urban Population (%)'].mean()
grouped_data
binned_data Low Density 21.341667 Medium Density 25.741667 High Density 54.666667 Name: Urban Population (%), dtype: float64
sns.barplot(x='binned_data', y='Urban Population (%)', data=copied_data, hue= 'binned_data', legend=False, palette= 'viridis', errorbar=None)
plt.title('Urban Population in different density')
plt.show()
As initutively and with images of Mumbai density stats in mind, we can clearly see high density regions mostly have high urban population. This is directly linked to the level of urbanization and topography.
Q.6 Extracting information about what are the districts in the state producing different Top Crop(s).
data.rename(columns= {'Geographical Coordinates (Latitude and Longitude)': 'Coordinates'}, inplace =True)
data['Latitude'] = data['Coordinates'].str.split(', ').str[0].str.strip('° N')
data['longitude'] = data['Coordinates'].str.split(', ').str[1].str.strip('° E')
data.drop(columns= 'Coordinates', inplace=True)
data.head()
| District Name | District Code | Administrative Division | Headquarters | Number of Talukas | Area (in sq. km) | Population (Census 2011) | Population Density (per sq. km) | Sex Ratio | Literacy Rate (%) | Urban Population (%) | Formation Date | Major River(s) | Major Crop(s) | Key Industries/Economy | Tourist Attractions | Latitude | longitude | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Pune | PU | Pune Division | Pune | 14 | 15643 | 9429408 | 603 | 915 | 86.15 | 60.5 | 1 May 1960 | Bhima, Mula, Mutha, Indrayani | Sugarcane, Jowar, Bajra, Grapes, Onions | IT & ITeS, Automotive, Manufacturing, Educatio... | Shaniwar Wada, Aga Khan Palace, Sinhagad Fort,... | 18.52 | 73.85 |
| 1 | Satara | ST | Pune Division | Satara | 11 | 10480 | 3003741 | 287 | 988 | 82.87 | 21.9 | 1 May 1960 | Krishna, Koyna, Venna | Sugarcane, Jowar, Soybean, Turmeric | Agriculture, Wind Power, Sugar Factories, Tourism | Kaas Plateau, Mahabaleshwar, Panchgani, Thoseg... | 17.68 | 74.01 |
| 2 | Sangli | SN | Pune Division | Sangli | 10 | 8572 | 2822143 | 329 | 966 | 81.48 | 25.5 | 1 May 1960 | Krishna, Warana | Sugarcane, Grapes, Turmeric, Jowar | Sugar Production, Turmeric Processing, Textile... | Sagareshwar Wildlife Sanctuary, Chandoli Natio... | 16.85 | 74.58 |
| 3 | Solapur | SO | Pune Division | Solapur | 11 | 14895 | 4317756 | 290 | 938 | 77.02 | 32.4 | 1 May 1960 | Bhima, Sina, Man | Jowar, Sugarcane, Pomegranate, Pulses | Textiles (Chaddars), Sugar Factories, Beedi In... | Siddheshwar Temple, Akkalkot Swami Samarth Mah... | 17.68 | 75.90 |
| 4 | Kolhapur | KO | Pune Division | Kolhapur | 12 | 7685 | 3876001 | 504 | 957 | 81.51 | 31.7 | 1 May 1960 | Panchganga, Krishna, Dudhganga | Sugarcane, Rice, Soybean, Jaggery | Sugar Mills, Foundries, Textiles, Kolhapuri Ch... | Mahalakshmi Temple, Panhala Fort, Jyotiba Temp... | 16.70 | 74.24 |
data['Latitude'] = pd.to_numeric(data['Latitude'])
data['longitude'] = pd.to_numeric(data['longitude'])
data['Major Crop(s)'].unique()
array(['Sugarcane, Jowar, Bajra, Grapes, Onions',
'Sugarcane, Jowar, Soybean, Turmeric',
'Sugarcane, Grapes, Turmeric, Jowar',
'Jowar, Sugarcane, Pomegranate, Pulses',
'Sugarcane, Rice, Soybean, Jaggery', nan,
'Rice, Vegetables, Fruits', 'Rice, Chickoo, Coconut',
'Rice, Mango, Cashew Nut, Coconut',
'Alphonso Mango, Rice, Cashew Nut, Coconut',
'Alphonso Mango, Cashew Nut, Coconut, Kokum, Rice',
'Grapes, Onions, Pomegranate, Sugarcane',
'Sugarcane, Jowar, Bajra, Pulses',
'Cotton, Jowar, Chilli, Groundnut',
'Banana, Cotton, Jowar, Pulses', 'Jowar, Cotton, Chilli, Maize',
'Cotton, Maize, Jowar, Bajra',
'Sweet Orange (Mosambi), Cotton, Jowar',
'Cotton, Sugarcane, Jowar, Bajra',
'Soybean, Pulses (Tur), Grapes, Jowar',
'Jowar, Soybean, Pulses, Sugarcane',
'Cotton, Jowar, Turmeric, Soybean', 'Jowar, Cotton, Soybean',
'Cotton, Soybean, Jowar, Banana', 'Oranges, Cotton, Soybean, Rice',
'Cotton, Soybean, Pulses (Tur)', 'Rice, Jowar, Pulses',
'Rice, Pulses, Linseed', 'Cotton, Rice, Soybean, Pulses',
'Rice, Tendu Leaves, Bamboo, Mahua',
'Cotton, Soybean, Tur (Pigeon Pea), Oranges',
'Cotton, Jowar, Soybean, Pulses', 'Cotton, Soybean, Pulses',
'Cotton, Jowar, Maize, Soybean', 'Soybean, Cotton, Tur, Jowar'],
dtype=object)
ser1= data['Major Crop(s)'].str.split(', ').explode() # Extracting the unique crops and making a dataframe of theirs
expanded_df = data.loc[ser1.index].copy() #Setting the new dataframe index based on the original data index
expanded_df['Major Crop(s)'] = ser1.values
expanded_df['Major Crop(s)'] = expanded_df['Major Crop(s)'].str.strip()
unique_crop= expanded_df['Major Crop(s)'].value_counts()
sns.barplot(x=unique_crop.values, y= unique_crop.index, palette= 'cool_d', hue=unique_crop.values, legend=False)
plt.title('Major Crops Produced in the state')
plt.show()
Jowar, representing the climate condition as well as our stat of Maharashtra being the top most producer of this crop, can be seen clearly. However we see variety of crops are grown in the state. Some are also very famous to the state like oranges and alphonso mangoes.
Top_4_crops= unique_crop.head(4).index.tolist()
Top_4_crops
['Jowar', 'Cotton', 'Soybean', 'Rice']
We find the Jowar, Cotton, Soybean and Rice being the top most crops.
import json
with open('Maharashtra.geojson', 'r') as f:
geojson = json.load(f)
for crop in Top_4_crops:
df = data[data['Major Crop(s)'].str.contains(crop, case=False, na= False)]
unique_districts = df['District Name'].unique().tolist()
z_values = list(range(len(unique_districts)))
# Creating the choropleth map
fig = go.Figure(go.Choropleth(
geojson=geojson,
locations=unique_districts,
z=z_values,
colorscale=px.colors.qualitative.Plotly,
locationmode='geojson-id',
featureidkey='properties.Dist_Name',
marker_opacity=0.5,
marker_line_width=1,
marker_line_color='black',
showscale= False # Hide the color bar
))
# For map layout
fig.update_geos(
visible=True,
scope='asia',
center={"lon": 80, "lat": 22},
projection_scale=10,
fitbounds='locations'
)
fig.update_layout(
title=f'{crop} producing major districts'
)
fig.show()
Central Maharashtra is the major region in terms of major crop production. Rice cultivation is mostly concentrated in eastern region, sea side specifically. Some districts appear to be most influential like Nanded, Chandrapur, Nagpur etc specializing in different crop production.
Thank You!